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1. Introduction 



This paper deals with the parametrization and fitting of a class of marginal independence 
models for multivariate discrete distributions. These models are associated to a class 
of graphs where the missing edges represent marginal independence. The graphs used 
have special edg es to distinguish them f rom undirected graphs used to encode conditional 
independencies. ICox Sz WermuthI (jl993l ) use dashed edges and call the graphs covariance 
graphs by stressing the equivalence b etween a marginal pairwise in dependence and a zero 
covariance in a Gaussian distribution. iRichardson Spirted (120021 ) use instead bi-directed 
edges following the tradition of path analysts. The interpretation of the graphs in terms of 
in dependencies i s base d on the pairwise and global Markov properties discussed or i ginall y 
bv iKauermannI (|l996l ) for covariance graphs and later developed bv iRichardsonI (|2003l ). 
These are recalled in Section 2. 

Models of ma r ginal independence can be useful in several contexts. For instance, 
Cox &: WermuthI (|l993l ) present an example on diabetic patients concerning four con- 
tinuous variables: Xi, the duration of the illness, X2, the quantity of a particular meta- 
bolic parameter, X3, a score for the knowledge about the illness, and X4, a questionnaire 
score measuring a patients' attitude called external fatalism. The structure of the corre- 
lation matrix suggests for this data set the marginal independencies X4^AL{Xi, X2} and 
Xi_lL {X3, X4}. This marginal independence model can be represented by the bi-directed 
graph in Figure [U^a), called a 4-chain. The suggested interpretation is that the dura- 
tion of illness Xi and the external fatalism X4 are independent explanatory variables 
of the responses X2 , X^ in two seemingly unr elated regressions. Fo r further discussion 
on the interpretation of covariance chains see (jWermuth et all 120061 ) . Bi-directed graph 
models are sometimes useful to represent marginal independence structures induced after 
marginalizing over latent variables. The independence structure of the diabetes data, for 
example, might be represented by assuming an underlying generating process described 
by a directed acyclic graph, shown in Figure [T]^b) , with one latent variable pointing both 
to X2 and X3. After marginalizing over the latent variable the induced independencies are 
exactly those encoded in the bi-directed gr aph of F i gure P a) . As another example with 
four binary variables, consider the data by Coppen ( 196d ) shown in Tabled! concerning 
symptoms of 362 psychiatric patients. The symptoms are: Xi : stability, X2 : validity. 
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Figure 1. (a) A bi-directed graph, called 4-chain, implying the indepen- 
dencies: 4JL 12 and 1_IL34. Directed acyclic graphs inducing the same inde- 
pendencies after marginalization over the latent variables (with nodes (gi): 
(b) with one latent variable; (c) with 3 latent variables. 
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: acute depression and X4 : solidity. The chi-squared tests of the hypotheses of mar- 
ginal independence X^^L{Xl,X2} and XiAL{X^^Xi}, with p-values, respectively, 0.32 
and 0.14, are separately not significant and the independence model defined by the two 
statements jointly gives a satisfactory fit with a deviance of 8.61 on 5 degrees of freedom. 
Thus the same bi-directed graph model defined by the 4-chain of Figure [I][a) is adequate. 
In Section E] we discuss the details of this application. In this example, if all symptoms are 
treated on the same footing, it is less plausible that a single latent variable will explain the 
independence structure and more (at least three) latent variables are required to suggest 
a generating process, as shown in the graph of Figure [TJc). 

Developing a parameterization for Gaussian bi-directed graph models is straightforward 
since the pairwise and the global Markov property are equivalent and they can be simply 
fulfilled by constraining to zero a subset of covariances. Accomplishing the same task 
in the discrete case is much more difficult due to the high n umber of parameters and to 
the non-equivalence of the two Markov properties. Recently, iDrton &: RichardsonI (j2007l ) 
studied the parametrization of bi-directed graph models for discrete binary distributions, 
based on Moebius parameters, by proposing a version of their iterative conditional fitting 
algorithm for maximum likelihood estimation. 

In this paper we propose different parameterizations, suita ble for general cat e gorica l 
variables, based on the class of marginal log-linear models of iBergsma fc Rudad (j2002l ). 
One special case of this class, especially useful in the context of bi-direc t ed gr aph mod- 
els, is the rnultivar iate logistic parameterization of iGlonek fc McCullaghl (119951 ): see also 
KauermannI (jl997l ). We discuss a further marginal log- linear parametrization that can, 
in special cases, be shown to imply variation independent parameters. We show that the 
marginal log-linear parameterizations suggest a class of reduced models defined by con- 
straining certain higher-order log-linear parameters to zero. Then we discuss maximum 
likelihoo d estimation of the models and we propo s e a general algor ithm based on previous 
works bv kitchison SilvevI (|l958l ). iLangj (|l99d ). bergsmal (|l997l ). 

The remainder of this paper is organized as follows. Section [2] reviews discrete bi- 
directed graphs and their Markov properties. In Section [3] we give the essential results 
concerning the theory of marginal log-linear models. Two parameterizations of bi-directed 
graph models are given then in Section d] illustrating their properties with special emphasis 
on variation independence and the interpretation of the parameters. In Section [5] we 



Table 1 . Data by I Coppen 1 196a ) on symptoms of psychiatric patients. 
The variables are Xi : stability (l=extroverted, 2=introverted) , X2 : valid- 
ity (l=psychasthenic, 2= energetic), X3 : depression (yes, no), X^ : solidity 
(l=hysteric, 2=rigid). 
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propose an algorithm for maximum likelihood fitting and then, in Section [6] we provide 
some examples. Finally, in Section \7\ we giy e a short discussion, with a comparison with 
the approach by Drton Sz RichardsonI ( 2007 ). 



2. Discrete bi-directed graph models 

Bi-directed graphs are essentially undirected graphs with edges represented by bi- 
directed arrows instead of full lines. We review in this section the main concepts of 
graph theory required to understand the models. A bi-directed graph G = {V, E) is a pair 
G = {V,E), where V = {1, . . . ,d} is a set of nodes, and is a set of edges defined by 
two-element subsets of V. Two nodes u, v are adjacent or neighbours if uv is an edge of 
G and in this case the edge is drawn as bi-directed, u < — > v. Two edges are adjacent if 
they have an end node in common. A path from a node u to a node w is a sequence of 
adjacent edges connecting u and v for which the corresponding sequence of nodes contains 
no repetitions. The nodes u and v are called the endpoints of the path and all the other 
nodes are called the inner nodes. 

A graph G is complete if all its nodes are pairwise adjacent. A non-empty graph G is 
called connected if any two of its nodes are linked by a path in G, otherwise it is called 
disconnected. If ^4 is a subset of the node set V of G, the graph Ga with nodes A and 
containing all the edges of G with endpoints in A is called an induced subgraph. If a 
subgraph Ga is connected (resp. disconnected, complete) we call also A connected (resp. 
disconnected, complete), in G. The set of all disconnected sets of the graph G will be 
denoted by V, and the set of all the connected sets of G will be denoted by C. In a graph 
G a connected component or simply a component is a maximal connected subgraph. If a 
subset D of nodes is disconnected then it can be uniquely decomposed into more connected 
components Ci, . . . , C^, say, such that D = Ci U ■ ■ ■ U Cr. 

The usual notion of separation in undirected graphs can be used also for bi-directed 
graphs. Thus, given three disjoint subsets of nodes A, B and C, A and B are said to be 
separated by G if for any u in A and any v in B all paths from n to f have at least one 
inner node in C. The cardinality of a set V will be denoted by \V\. The set of all the 
subsets of V, the power set, will be denoted by ViV). We use also the notation Vo{V) for 
the set of all nonempty subsets of V. 

Let X = {X^,v G y) be a discrete random vector with each component taking 
on values in the finite set Xy = {1, . . . , The Cartesian product Jy = x^gyT^, is a 
contingency table, with generic element i = {iv,v G V), called a cell of the table, and 
with total number of cells t = \Iv\- We assume that X has a joint probability function 
p{i), i G Iv giving the probability that an individual falls in cell i. Given a subset 
M C y of the variables, the marginal contingency table is Im = ^v&m'^v with generic 
cell iM and the marginal probability function of the random vector Xm = [Xy,v G M) is 

PM{iM) = T.j&y\j,,=i,,P{3)- 

A bi-directed graph G = {V, E) induces an independence model for the discrete random 
vector X = {X^,v G V) by defining a Markov property, i.e. a rule for reading off the 
graph the independence relations. In the following we shall use the shorthand notation 
A^LB\C to indicate the conditional independence Xa^Xb\Xc, where A, B and C are 
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three disjoint subsets of V. Similarly AALB and AALBALC will denote the marginal 
and the complete independence, respectively, of sub-vectors of X. There are two Markov 
properties describing the independence model associated with a bi-directe d grap h, which 

23) 



we consider in this paper: (a) the glo bal Markov 



the connected set Markov property by iRichardsonI (j2003l ). 



j roper ty of iKauermannI (119961 ) and (b) 



The distribution of the random vector X satisfies the global Markov property for the 
bi-directed graph G if for any triple of disjoint sets A, B and C, 

AAL B \V \ {ALi B L) C) whenever A is separated from i? by C in G. 

Instead, the distribution of X is said to satisfy the connected set Markov property if 

(1) Ci_lL ■ ■ ■ ALCr whenever Ci, . . . ,Cr are the connected components 

of every disconnected set D € T>. 



RichardsonI (120031 ) proves that the two properties are equivalent; see also lDrton &: Richardson 



(120071 ). Following these authors we define a discrete bi-directed graph model as follows. 



Definition 2.1. A discrete bi-directed graph model associated with a bi-directed graph 
G = {V, E) is a family of discrete joint probability distributions p for the discrete random 
vector X = {Xj;,v € V), that satisfies the property ^ for G, i.e. such that, for every 
disconnected set D in the graph, 

PDiio) = PCiiici) X • • • X PCriiCr), 
where Gi, . . . ,Gr are the connected components of D . 

If the global Markov property holds then for any pair of not adjacent nodes, the associ- 
ated random variables are marginally independent. This implication is called the pairwise 
Markov property and it is for discrete variables a necessary but not sufficient condition 
for the global Markov property. This is in sharp contrast with the family of Gaussian 
distributions where the two properties are equivalent. 

Example 1. Here and henceforth we shall use the short forms 34 and 12 to denote the 
sets {3,4} and {1,2}, and so on. The graph of Figure [TJ a) is a chain in 4 nodes with 
disconnected sets 

V = {13,14,24,134,124}. 

Thus, D = 13 has the components Ci = 1 and C2 = 3, while D = 134 can be decomposed 
into Ci = 1 and G2 = 34. The pairwise Markov property implies 1_IL3, 1_IL4 and 2_IL4, 
while the connected set Markov property implies further that IJL 34 and 4 JL 12. The global 
Markov property implies the equivalent set of independence statements IJL 4, 2_IL4|1 and 
1_IL3|4. 



Note that the complete list of all marginal independencies implied by a bi-directed graph 
model is derived from the class V of all disconnected sets of the graph. 
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(a) 



(b) 



Figure 2. Two bi-directed graphs. The independencies implied by the con- 
nected set Markov property (or, equivalently, the global Markov property) 
are: (a) 1_IL34, 3_1L 15 and 5JL23; (b) 1_IL3JL5, 1_IL345, 12_IL45 and 
123JL5. 



Example 2. The graph of Figure [2]^a) has 7 disconnected sets and thus the associated 
discrete bi-directed graph model fulfihs the independencies 

1_IL3, 1_IL4, 2JL5, 3J_5, 1_IL34, 5_IL23, 3_IL15 

that reduce to 1_IL34, 3_IL15 and 5_IL23, after ehminating redundancies. The discrete 
model associated with the graph of Figure ^b) with 16 disconnected subsets satisfies 16 
marginal independencies that can be reduced to the four statements 

1_IL3JL5, 1JL345, 12_IL45, 123_IL5. 

The stronger condition required by Definition 12.11 implies that in some situations not all 
marginal independence relations are representable by bi-directed graphs, as the following 
example shows. 



Example 3. Consider the data in Table [21 due to iLienertI (Il970l ). The variables are 
3 symptoms after LSD intake, recorded to be present (level 1) or absent(level 2), and 
are distortions in affec tive behav i or (X -\ ) , distortions in thinking {X2), and dimming of 
consciousness (X3). As 



Wermuth 



(|l998l ) points out, the frequencies in the three marginal 
tables show that the three symptom pairs are close to independence, but at the same 
time the variables are not mutual independent as witnessed by the strong three-factor 
interaction due to the quite distinct conditional odds ratios between Xi and X2 at the 
two levels of X3. Thus, in this case, despite three marginal independencies, a discrete 
bi-directed graph model can represent just one of them, and thus must include at least 
two edges. 



Pearl &: WermuthI (j 19941 ) studied the Markov equivalence between bi-directed graph 



models (actually the covariance graphs) and directed acyclic graphs models, i.e. when 
the two models imply exactly the same conditional independence stater nents, und e r the ir 
respective global Markov property (for the global Markov property see iLauritzenl . Il996l ). 
They showed that each bi-directed graph is always Markov equivalent to a directed acyclic 
graph with additional synthetic latent nodes, after marginalizing over the additional nodes, 
as exemplified in Figure [ijb, c). Moreover they also give a Markov equivalence result, 
proving that a bi-directed graph is equivalent to a directed acyclic graph with the same 
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set of nodes if and only if it contains no 4-chain. Thus, there is no directed acychc graph 
which is Markov equivalent to the bi-directed graphs of Figures [TJ a), [21^ a) or[2Jb). 



3. Marginal log-linear parameterizations 

Discrete bi-directed graph models may be defined a s marginal log- linea r mod els, using 
complete hierarchical parameterizations as defined by iBergsma &: Rudaa (|2002l ) . In this 
section we review the basic concepts and we discuss the definitions of the parameters 
involved. Let p{i) > be a strictly positive probability distribution of a discrete random 
vector X = {Xy, v & V) and let pM(^Af) be any marginal probability distribution of a sub- 
vector Xm, M CV. The marginal probability distribution admits a log-linear expansion 



log pMiin) = 



LCM 



where XfJ (II) is a function defining the log- linear parameters indexed by the subset L of 
M. The functions X'^{iL) are defined by 



ACL 



where i* 



11 



m\a) 



see 



Whittakeil (Il99m and 



, 1) denotes a baseline cell of the table; 
LauritzenI (jl996l ). The function \^ [i^) is zero whenever at least one index in is equal 
to 1. Therefore, X^{iL) defines only HtjeLl^f ~ 1) parameters where is the number of 
categories of variable X^. Due to the constraint on the probabilities, that must sum to 
one, the parameter = \ogp{i\.j^) is a function of the others, and can thus be eliminated. 

If is the vector containing the parameters XfJ (ii), then it can be obtained explicitly 
using Kronecker products as follows. For any subset L of M, let l be the matrix 



C 



v,L 



(-16.-1 Ib,-i) ifveL 
(1 06„„i) ifv^L. 



and let tv'^ be the t a/ x 1 column vector of the marginal cell probabilities in lexicographic 
order. Then, the vector of the log- linear parameters XfJ (ii) is 



(2) 



Af = Cf logTr*^ where Cf 



Table 2. Data bv lLieneri uQlA ) concerning symptoms after LSD-intake. 
OR is the conditional odds-ratio between Xi and X2 given X3. The fre- 
quencies show evidence of pairwise independence, but mutual dependence. 
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For a discuss ion of the technique of b uilding all log-linear parameters based on Kronecker 
products see lWermuth k. Coxl (|l992l ). The coding used in this paper corresponds to their 
indicator coding, and gives the parameters used for example by the program GLIM. 

A marginal log-linear parameterization of the probability distribution p{i) is obtained 
by combining the log-linear parameters \^ for many different margin al probability distri- 
butions. The general theory is developed in lBergsma &: RudasI (|2002l ) and is summarized 
below. 



Definition 3.1. Let M 



(Ml 



he an ordered sequence of margins of interest, 
and, for each Mj, j = 1, . . . , s, let Cj be the collection of sets L for which x'j^^ is defined 
with equation ([2]). Then, (X^^) is said to be a hierarchical and complete marginal log- 
linear parameterization for p{i) if (i) the sequence Mi, . . . ,Ms is non- decreasing; (ii) the 



last margin is Mg 
are: 



V ; {Hi) the sets defining the log-linear parameters in each margin 



Ci = Vo{Mi), and Cj = Vo{Mj) \ \J for j > 1, 

h=l 

where Vo{Mj) denotes the collection of all non-empty sets of Mj. 

The parameterization is called hierarchical because it is generated by a non-decreasing 
sequence M., and complete because it defines all possible log-linear parameters terms, each 
within one and only one marginal table. Notice that the parameterization is associated 
uniquely to a particular sequence M of margins. Thus, a different (still non-decreasing) 
ordering of the sequence induces a different parameterization; see the examples in Sec- 
tion 

The above construction defines a map from the simplex Ay of the strictly positive 
distributions p{i) of the discrete random vector X into the set A of possible values for 
the whole vector of the marginal log-linear parameters A = (A^^), with j = 1, . . . , s and 
L E Cj. The following general result shows that a complete hierarchical marginal log-linear 
model defines a proper parameterization. 



Proposition 1. Werasma & Rudai . \200£) The map Ay ^ A C R* ^ defined by a com- 
plete and hierarchical marginal log-linear parameterization is a diffeomorphism. 

The parameters A can be written in matrix form 

A = Clog(T7r) 

where tt is the t x 1 vector of all the cell probabilities in lexicographical order, T is a m x t 
marginalization matrix such that 



Ttt 



diag(Cf) 



is a t — 1 X m block diagonal matrix, with m = T^j-j \I> 



Ma 



and C 

discussion of algorithms for cor nputing the matrices C a nd T see iBartolucci et al. 



For a 



(120071) 



that generalize the approach by lBergsma &: RudasI (j2002l ) to logits and higher order effects 
of global and continuation type, suitable with ordinal data . 
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The log-linear parameterization and the multivariate logistic transformation represent 
two special cases of marginal log-linear models. The standard log-linear parameters are 
generated hy Ai = {V}. They will be denoted by 9l = for L G Vo{V) and the 
whole vector of parameters by 6. The parameter space coincides with R*~^ and the map 
from TT to admit s an inverse in closed form, provided that tt > 0. The multivariate 
logistic parameters iGlonek &: McCullaghl (|l995l ) are generated hy Ai = Vo{V), in any 



non-decreasing order. They will be denoted by ry^^ = Aj[{, with rj representing the whole 
vector. Thus the parameters r]^^ correspond to the highest order log-linear parameters 
within each marginal table Tm, for each nonempty set M QV. The parameter space is in 
general a strict subset of R*~^, except when the number of variables is d = 2. In general 
there is no closed form inverse transforming back r] into tt. The inverse operation however 
may be accomplished using for example the iterative proportional fitting algorithm. 

Thus, while the log-linear parameters 6 are always variation independent and for any 6 
in R*^^ there is a unique associated joint probability distribution tt, instead the multivari- 
ate logistic parameters are never variation independent, for d > 2. Thus there are vectors 
r] m R*-i that are not compatible with any joi nt probability distributioii tt. The latter 
assertion is also implied by a further result by iBergsma &: RudasI (|2002l ) which proves 
that the hierarchical and complete marginal log-linear parameterization generated by a 
sequence Ai is variation independent if and only if Ai satisfies a property called ordered 
decomposability. A sequence of arbitrary subsets of V is said to be ordered decomposable if 
it has at most two elements or if there is an ordering Mi, . . . , Mg of its elements, such that 
Mi 2 Mj if i > j and, for k = 3, . . . , s, the maximal elements (i.e. those not contained in 
any other sets) of {Mi, . . . , M^} form a decomposable set. For f urther details and exam- 
ples about ordered decomposability see lRudas &: Bergsmal (j2004l ). More properties of the 
two parameterizations and rj, connected to graphical models, will described in the next 
Section [H 



4. Parameterizations of discrete bi-directed graph models 

We suggest now two different marginal log-linear parameterizations of discrete bi- 
directed graph models, and we compare advantages and shortcomings. 

4.1. Multivariate logistic parameterization. It is known that the complete indepen- 
dence of two sub-vectors Xa,Xb of the random vector X is equivalent to a set of zero 
restrictions on multivariate logistic parameters. 



Lemma 1. (IKauermannl (jl997l ). Lemma 1). If {A,B} is a partition of V and rj = 
(r/*^),M € Vo{V) is the multivariate logistic parameterization, then 

AAL B rj^'^ = for all M eQ 

where Q = {M C AU B : M D A ^ (D, M f] B ^ (I)}. 

We generalize this result to complete independence of more than two random vectors. 
Given a partition {Ci, Cr} of a set D C V , we define 



Q(C7i, ...,Cr) = v (ULi c,) \ ULiT'lCfc). 
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This is the set of ah subsets of D not completely contained in a single class, i.e. containing 
elements coming from at least two classes of the partition. With this notation, the set Q 
of Lemma [1] may be denoted by Q{A, B). Then we have the following result. 

Proposition 2. Let X = {Xy)^v € V, he the discrete random vector with multivariate 
logistic parameterization rj = {r]^^),M G Vo{V). If D QV is partitioned into the classes 
{Ci, . . . , Cr} then 

CiM ...MCr ^ for all M € Q(Ci, . . . , a) : -q^ = 0. 

Proof. First, use the shorthand notations Q to denote the set Q(Ci, . . . ,Cr) and Qj to 
denote the set Q(Cj, C_i), i = 1, . . . , r, where C_j = D\Ci. In fact, since Qi C Q, then 
Ui=i Qi ^ Q- Conversely, for any M G Q there is always a class Ci such that Ci ^ M, 
and hence, by definition, M G Qi. Hence, for every M G Q, M e Ui=i Qi thus 
Q C \Jl^-^ Qi. Then, the complete independence Ci_lL • • • JLCj. is equivalent to Cj_lLC_j 
for all i = l,...,r. By Lemma [H applied to the sub-vector Xjj, each independence 
Ci-\LC-i is equivalent to the restriction rj^ = for M G Qj and the parameters r/^ are 
identical to the corresponding multivariate logistic parameters for the full random vector 
Xy. Thus, the complete independence Ci_lL • • • JLC^ is equivalent to r]^'^ = for M G Qi, 
i = l,...,r, i.e. for Af G UI=i = Q- □ 

Proposition[2]implies that a statement of complete independence Ci_lL . . . JLC^ is equiv- 
alent to a set of zero constraints on the multivariate logistic parameters. The following 
result explains how the constraints must be chosen in order to satisfy all the independencies 
required by the Definition 12.11 of a bi-directed graph model. 

Proposition 3. Given a bi-directed graph G = (y,E), the discrete bi-directed graph model 
associated with G is defined by the set of strictly positive discrete probability distributions 
with multivariate logistic parameters r} = {rj^^), M G Vq^V), such that 

ri^'^ = for every M G V, 

where T> is the set of all disconnected sets of nodes in the graph G. 

Proof. Given a set G T>, denote its connected components by {Ci, . . . Cr} and by Qd the 
set Q(Ci, . . . , Cr). First, we prove that T) = UdgX) Qd- In i&ci, for any D G P, Qd ^ 
because it is a class of disconnected subsets of D. Thus, Udgi? Qd ^ Conversely, 
if D G P, then D ^ Qr, and thus V C IJ^^pQi). By Definition 12.11 the indepen- 
dence Ci _IL • • • _IL Cr is implied for each disconnected set D with connected components 
Ci, . . . , Cr. By Proposition [21 this is equivalent to the zero restrictions on the multivariate 
logistic parameters 

T7^-^ = 0, for ah M G Qd, D (^V 
i.e. for all M G [j^ev Qd = V. □ 

A consequence of Proposition [3] is that all possible discrete bi-directed graphical models 
can be identified within the multivariate logistic parametrization under the zero constraints 
associated with the disconnected sets. 
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Table 3. Comparison between two parameterization of the discrete chord- 
less 4-chain model of Figure U^a): (rj) with bi-directed edges; (6) with 
undirected edges. 
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Example 4. The discrete model associated with the chordless 4-chain of Figure [T^a) is 
defined by the multivariate logistic parameters shown in Table O first row. There are 5 
zero constraints on the highest-order log-linear parameters of the tables 13, 14, 24, 124 
134. There are three nonzero two-factor marginal log-linear parameter 77*-' associated with 
the edges of the graph that may be interpreted as sets of marginal association coefficients 
between the involved variables, based on the chosen contrasts. Consider now the reduced 
model resulting after dropping the edge 2 ■«->■ 3 and implying the independence 12JL34. 
This model can be obtained, within the same parameterization, by the additional zero 
constraints on 77^^, 77-^^^, 77^^^ and 77-^^^^. 

While the parameters are in general not variation independent, they satisfy the upward 
compatibility property, because they have the same meaning across different marginal 
distributions. Using this property, we can prove the following result concerning the effect 
of marginalization over a subset A of the variables. Let Ga = {A,Ea) be the subgraph 
induced by A, and let Va be the set of all disconnected sets of Ga- 

Proposition 4. // a discrete probability distribution p{i) for i G Zy satisfies a bi-directed 
graph model defined by the graph G = {V,E) then the marginal distribution pAi^A) over 
A C.V satisfies the bi-directed graph model defined by Ga = (A, Ea) and its multivariate 
logistic parameters are rj = {r]^^),M G VoiA) with constraints rj^'^ = 0, for M € T^a- 

Proof. After marginalization over A, the multivariate logistic parameters associated with 
PA(*yl)j by the property of upward compatibility, are {t]^^,M G Vo{A)). Some of these 
parameters are zero by the constraints implied by the original bi-directed graph model, 
i.e. r/^ = 0, for M G P n Vo{A). The result is proved by showing that V n VoiA) = Va- 
First, we note that if D A Q V, then the graph Gd = {D^Eq) with edges E£, = 
{DxD)nE = {DxD)nEAisa subgraph of both Ga and G. Thus, if L> C ^ and G D 
then the induced subgraph Gd is disconnected and being also a subgraph of Ga then D is 
also a disconnected set of Ga- Thus V n Vo{A) C Va- Conversely, if L> is a disconnected 
set of Ga, then the subgraph is disconnected, and being a subgraph of G, then D is 
also a disconnected set of G. Thus Va QVf] V{A), and the result follows. □ 

Discrete bi-directed graph models in the multivariate logistic parameterization can be 
compared with discrete log-linear graphical models represented by undirected graphs with 
the same skeleton (i.e. with the same set E). To facilitate the comparison we state the 
following well-k nown result, following from the Hammersley and Clifford theorem, (see 



Lauritzen 



19961 . p. 36), which is the undirected graph model counterpart of Proposition [3j 
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Proposition 5. Given an undirected graph G = {V,E), a discrete graphical log-linear 
model associated with G is defined by the set of strictly positive discrete probability distri- 
butions with log-linear parameters 6 = {6l,L G Voiy)), such that 

Ol = for every L € Af, 

where Af is the set of all incomplete subsets of nodes in the graph G. 

The set T) of all disconnected sets of a graph G is included in the set N of the incomplete 
sets, and therefore the number of zero restrictions of the undirected graph models is always 
higher than t he number of zero restrictio ns of the bi-directed graph models with the same 
skeleton, (see 



Drton &: Richardson 



20o3). 



Example 5. A discrete undirected graph model for the 4-chain implies the independencies 
12JL4|3 and 1_IL34|2 and is defined by zero constraints on 8 log- linear parameters 6^^ 
shown in Tabled second row. Also, Proposition implies that in the discrete undirected 
graph model the general hierarchy principle holds, i.e. if a particular log-linear term is 
zero then all higher terms containing the same set of subscripts are also set to zero. On the 
contrary, by Proposition [3l in the multivariate logistic parameterization of the bi-directed 
graph model the hierarchy principle is violated because a superset of a disconnected set 
may be connected. Thus, for instance in the example shown in Table [3] there are zero 
pairwise associations, like r}^^ = 0, but nonzero higher order log- linear parameters like 
77^23 ^ and r]^'^^'^ / 0. 

4.2. The disconnected sets parameterization. We discuss now another marginal log- 
linear parameterization that can represent the independence constraints implied by any 
discrete bi-directed graph model, but involving only those marginal tables which are 
needed. This parameterization defines the log-linear parameters within the margins asso- 
ciated with the disconnected sets of the graph defining the model. Specifically, given a dis- 
crete graph model with a graph G, we arbitrarily order the disconnected sets of the graph 
to yield a non-decreasing sequence (Di, . . . ,Ds) such that Df^ 2 -^fc+i for /c = 1, . . . , s — 1. 
Then, the disconnected set parameterization of the discrete bi-directed graph model as- 
sociated with G, is the hierarchical and complete marginal log-linear parameterization 
A = (A^^^ ) generated, following Definition 13.11 by the sequence of margins 



(3) Mg 



{Di,...,D,) iiD, = V 
{Di, . . . , Ds,V) otherwise. 



This parameterization contains by definition the log-linear parameters A^ = ij^ for every 
disconnected set D and thus can define the independence model by the same constraints 
of Proposition [3l 

Proposition 6. Given a bi-directed graph G = (V,E), the discrete bi-directed graph model 
associated with G is defined by the set of strictly positive discrete probability distributions 
with a disconnected set parameterization (A^^^), such that 

■^Mj ~ ^ f'^^ every Mj S V, 
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Table 4. Comparison of three parameterizations for the bi-directed graph 
model G of Figure\^a) . One-factor log-linear parameters are omitted. The 
columns of parameters to he constrained to zero have a boldfaced label. 



Terms 


12 


13 


14 


23 


24 


34 


123 


124 


134 


234 


1234 






^13 


^14 


^23 


^24 


^34 


^123 


^124 


^134 


.^234 


^1234 


Mg 


A 124 
^^12 


'^13 




Xl234 
-^23 


-^24 


Xl34 
^^34 


X 1234 
'^123 


■V 124 
^^124 


■Vl34 
^^134 


A 1234 
'^234 


A 1234 
^^1234 




A 124 
^^12 


■Vl34 
^^13 




\ 1234 
-^23 


\124 
-^24 


Xl34 
'^34 


X 1234 
'^123 


A 124 
^^124 


A 134 
^^134 


A 1234 
^^234 


A 1234 
^^1234 



where T> is the class of all disconnected sets for G. Moreover, the constraints are indepen- 
dent of the ordering chosen to define Mg- 

Proof. The disconnected set parameterization defined by the sequence ([3]), contains the 
parameters A^, with D ^V. By Definition l3.lt £j, j = 1, . . . , s always contains the set D 
itself. This happens whatever ordering is used to define Mg- Thus the parameterization 
always includes \^ = rj^ , for every D £ V and it is possible to impose the constraints 
r]^ = for every D £ D and the result follows by Proposition O □ 

While the constrained parameters defining the bi-directed graph model are actually 
the same as the multivariate logistic parameterization, the other unconstrained log-linear 
parameters are defined in larger marginal tables, and thus have a different interpretation. 
An important difference is that the disconnected set parameterization is tied to the specific 
graph G defining the model. This implies that it is not possible to define every bi-directed 
graph model within the same disconnected set parameterization. A different model G 
implies a different sequence Mg disconnected sets and thus a different list of log-linear 
parameters. 

Example 6. For the chordless 4-chain graph of Figure [TJa), there are several possible 
orderings of the 5 disconnected sets T> = {13,14,24,134,124}. The discrete bi-directed 
graph model is defined by choosing for example 

Mg = (13, 14, 24, 134, 124, 1234), 

and by constraining the marginal log-linear parameters = for D €z T>. The uncon- 
strained parameters differ from the multivariate logistic ones. For example the two-factor 
log-linear parameters between Xi and X2, A}|^, are defined within the marginal table 
124 instead of the marginal table 12. A detailed comparison between the parameters is 
reported in the first two rows of the Table O 

The previous example shows that we can collect the log-linear parameters into a reduced 
number of marginal tables. An alternative selection of marginal tables could be chosen 
in order to fulfill the conditional independencies implied by the global Markov property. 
We will describe the method in the special case of the chordless 4-chain graph. It is 
conjectured that a general variation independent parameterization does not exists for all 
bi-directed graphs, but the definition of a sub-class admitting such a parameterization is 
still an open problem. 
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Example 7. In Example [T] we stated that, for the bi-directed 4-chain graph of Fig- 
ure[Ha), the global Markov property implies the conditional independencies 1JL4, 2JL4|1 
and 1JL3|4. Thus, the relevant margins can be collected in the sequence 

M'g = (14, 134, 124, 1234) 

where the first three allow the definition of the conditional independencies and the last one 
serves as completion of the parameterization. The complete hierarchical parameterization 
generated by M'q is slightly different from that generated by Mg^ see Table |31 third 
row, but with the 5 zero constraints on the higher level log-linear parameters within each 
margin, we obtain the required independencies 



1_IL4 



2iL4|l 



\124 
^^24 

\124 
^^124 








1J_3|4 



'Vl34 
'^IS 

-^134 




0. 



Note that these independencies can also be represented by a chain gra ph with two com- 



pone nts, {1,4} and {2,3}, under the alternative Markov property, (see lAndersson et al. 



2001 



). The associated discrete model is interpreted as a system of seemingly unrelated 
regressions, with two joint responses X2 and X3. In this context the associations of in- 
terest are the effect parameters between every response and each explanatory variable 
conditional on the remaining explanatory variable, i.e. A^^^, A24^, X\^'^ and X\a^, and 



^34 ) 

the marginal association parameters between the explanatory variables, X\f. By relaxing 



the constraint = we obtain a discrete chain graph model with two complete chain 
components, under the alternative Markov property. 

In the comparison between different p arameterizations also the property of variation 
independence may be relevant. Following iBergsma &: RudasI (|2002l ) . given a discrete bi- 
directed graph model, there is a variation independent parameterization if there is at least 
a sequence A4g which is ordered decomposable. This property is quite relevant because 
the lack of variation independence may make the separate interpretation of the parameters 
misleading. 

Example 8. In the previous example both the parameterizations based on A4g and M'q 
are variation independent (unlike the multivariate logistic parameterization) because the 
sequences of margins are both ordered decomposable. Consider instead the bi-directed 
graph in Figure [2{a). Two possible disconnected set parameterizations of the discrete 
model may be based for example on 

Mg = (13,14,25,35,134,135,235,12345), 
M'g = (13,35,135,14,25,134,235,12345). 



with the constraints Xj^ = for any disconnected set D. In this case we can verify that 
only the sequence M'g is ordered decomposable and thus implies variation independent 
parameters. 
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5. Maximum likelihood estimation of discrete bi-directed graph models 



We study now the maximum likelihood estimation of the discrete bi-directed graph 
models under any of the parameterizations previously discussed. Assuming a multinomial 
sampling scheme with sample size A^, each individual falls in a cell i of the given contin- 
gency table ly with probability p{i) > 0. Let n{i) be the cell count and n = {n{i),i G ly), 
be a t X 1 vector. Thus, n has a multinomial distribution with parameters N and tt. If 
/X = Nir > is the expected value of n and lj = log fi, then for any appropriate marginal 
log-linear parameterization A we have A = Clog(T7r) = Clog(Texp(a;)) because the 
contrasts of marginal probabilities are equal to the contrasts of expected counts. Given a 
discrete bi-directed graph model defined by the graph G = {V,E), if A is defined either 
by the multivariate logistic parameterization or by the disconnected set parameterization, 
we can always split A in two components Xx> and Ac indexed by the disconnected sets 
V and by the connected sets C of the graph, respectively. If Cti is a sub-matrix of the 
contrast matrix C, obtained by selecting the rows associated with the disconnected sets 
of the graph G, 

Xv = Cvlog{T exp{u)) = h{u) 

where C-d has dimensions q yi v with q = YlDevY\v&Di^-" ~ -*■)• Thus, the kernel of the 
log-likelihood function of the discrete bi-directed graph model is defined by 



(4) 
with 



l{u:;n) 



T 

n u 



1 exp(a;), w € VLbg, 



Jl^G = G R* : h{u) = 0, 1^ exp(c^) = N}. 
Note that ^ defines a curved exponential family model as the set Q-bg is a smooth 
manifold in the space R* of the canonical parameters /x. Maximum likelihood estimation 
is a constrained optimization problem and the maximum likelihood estimate is a saddle 
point of the Lagrangian log-likelihood 



t 
n u 



l'^exp(cj) +T'^h{u) 



where r is a x 1 vector of unknown Lag range multipliers. To solv e the equation s we 
propose a n iter ative procedure inspired by lAitchison &: Silveyl (119581 ) . iLangj (jl99a ) and 
Bergsmal (| 19971 ). Define first 

Jr. 



dl 



-E 



where the dot is a shortcut to denote a symmetric sub-matrix. Differentiating the La- 
grangian with respect to u and r and equating the result to zero we obtain 

Jr) \ Hu) ^ 

where e = dl/du = n — fj,, H = dh/dup- = D^T^ D^^Cj"^ and Dr/i and are diagonal 



(5) 







matrices, with nonzero elements T/x and /x, respectively. 

Let a> be a l ocal maxi r num of the likelihood subject to the constraint h{u) = 0. A 
classical result (iBertsekad . Il982l ) is that if H is of full column rank at a>, there is a unique 
T such that ^(a>, t) = 0. In the sequel, it is assumed that the maximum likelihood estimate 
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LJ is a. solution to the equation ([5]). Note that the constraint 1^ n = l^n is automaticahy 
satisfied as it can be verified that H^l = and thus from @ it follows that l^e = 0. 
Aitchison and Silvey propose a Fisher score like updating function 

(6) ^(^'+1) = ui^^''^), with «(0 = $ + F-H^)fi$), 

yielding the estimate at cycle k + 1 from that at cycle k. As the algorithm does 

not always converge when starting estimates are not close enough to it is necessary to 
introduce a step size into the updating equation. The standard approach to choosing a 
step size in optimization problems is to use a value for which the objective function to be 
maximized increases. However, since in in this case we are looking for a saddle point of 
the Lagrangian likelihood i, we need to adjust the standard strategy. First, the matrix F 
has a special structure with i^^^^^ = D^, F^r = —H and Frr = 0. Thus, indicating the 
sub- matrices of F^"^ by superscripts, we have Fr^F'^'^ = / and F^^F^r = 0. Thus the 
updating function u(^) of ([6]) can be rewritten as follows 

neither of which is a function of r. As the updating of the Lagrange multipliers does non 
depend on the estimation for r at previous step, the algorithm essentially searches in the 
space of ( jj. Hence, inserting a step size is only required for updating u and we propose. 



following iBergsmal (119971 ) to use the following basic updating equations with an added 



step size, < step'^'^^ < 1: 

c^e^'+i) = u'-^) +step('''){F'^'^(''')e('=) + 

where e^^) = n — fi"^^ and where _F"^'^('=) and F'^'^''^^ are two sections of F ""^ at cycle 
k. We chose the step size by a simple step halving criterion, but more sophisticated step 
size rule s could a l so be considered. A discussion on the choice of the step size may be 
found in IBergsmal (119971 ). Note that the algorithm's updates take place in the rectangular 
space R* of u rather than the not necessarily rectangular space A of the marginal log- 
linear parameters which may not be variation independent. The algorithm converges if 
it is started from suitable initial estimates of u and r. While usually a zero vector is 
a good choice for r, we found empirically that the number of iterations to convergence 
can be reduced substantially by using a s a starting value f or u an ap proximate maxi mum 



likelihood estimate based on results by ICox WermuthI (|l99Cll ) and 



Roddam 



(|200J). At 



convergence, we obtain the maximum likelihood estimates jl = exp(a>) and n = N ^/j, 
and the asymptotic covariance matrices 

cov(a;) = f"", cov(A) = HsatF^'^ hJ,^, with Hsat = Df^T^D-lC^. 

6. Analysis of some examples 

The examples of this section illustrate both the parameterizations and the fitting of 
marginal independence models. It is rare that a pure marginal independence model is 
useful in isolation and thus usually it is interpreted in combination with other graphical 
models. However, the problem of simultaneous testing of multiple marginal independencies 
in a general contingency table is often present in applications and it can be carried out 
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Table 5. Parameters estimates of the 4-chain model for the data on symp- 
toms of psychiatric patients under the multivariate logistic and the discon- 
nected set parameterizations. The fit is xi = 8.61. Columns (1) and (2) 
are studentized estimates. 



Multivariate logistic param. Disconnected set param. 



Margin 




(1) 


Margin 


Interaction 


A 


(2) 


1 


-0.28 


-2.62 


13 


1 


-0.28 


-2.62 


2 


-0.13 


-1.23 




3 


0.21 


1.95 


3 


0.21 


1.95 




13 


0.00 




4 


0.24 


2.31 


14 


4 


0.24 


2.31 


12 


-0.72 


-3.47 




14 


0.00 




13 


0.00 




24 


2 


-0.13 


-1.23 


14 


0.00 






24 


0.00 




23 


-1.12 


-5.32 


124 


12 


-0.72 


-3.47 


24 


0.00 






124 


0.00 




34 


0.79 


3.80 


134 


34 


0.79 


3.80 


123 


0.16 


0.36 




134 


0.00 




124 


0.00 




1234 


23 


-0.78 


-1.80 


134 


0.00 






123 


0.14 


0.20 


234 


-0.90 


-2.03 




234 


-1.02 


-1.63 


1234 


0.15 


0.16 




1234 


0.15 


0.16 



with the tec hnique discussed in this paper. All the computations were programmed in the 



R language (|R Development Core Teaml . 120071 ) . 



Example 9. The 4-chain marginal independence model was fitted to the data on symp- 
toms of psychiatric patients of Table [1] with the algorithm of Section [5l After 22 iterations, 
the algorithm leads to a chi-squared goodness of fit of 8.61 on 5 degrees of freedom. By 
comparison, the best graphical log-linear model has generators [12] [234] with a deviance 
of 8.4 on 6 degrees of freedom. Thus, both models provide adequate interesting inter- 
pretations of the data. Table [5] summarizes the estimates of the 4-chain graph model, 
showing the parameter estimates and the studentized estimates under the multivariate 
logistic and the disconnected set parameterizations. In the multivariate logistic parame- 
terization the two-factor parameters have the simple interpretation of marginal association 
coefficients. It must be kept in mind that they measure just the strength of marginal as- 
sociation between pairs of adjacent variables in the graph, but that the model includes 
higher order log-linear parameters which are not visible from the graph. For instance, both 
if'^ = —1.12 and rf^'^ = —0.90 are measures of association for variables X2 and X^. In 
general, for any connected subgraph, all higher order log-linear parameters are expected. 
As explained in Section U the interpretation of the parameters necessarily depends on the 
chosen parameterization. For instance, if'^ = —1.12 and X^'^ = —0.78 are a marginal 
association measure and a conditional association measure respectively. The four-factor 
log-linear parameter is not significant, and a simpler reduced model with the additional 
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Table 6. Data from U.S. General Social Survey. 





F 


1 






2 






3 






A 


J 


1 


2 


3 


1 


2 


3 


1 


2 


3 


1 




410 


241 


80 


691 


556 


187 


192 


148 


84 


2 




71 


31 


9 


109 


64 


34 


27 


26 


15 


1 




181 


128 


42 


307 


284 


82 


84 


93 


41 


2 




41 


17 


5 


61 


35 


20 


18 


13 


5 


1 




96 


77 


29 


163 


151 


76 


58 


55 


27 


2 




34 


18 


7 


58 


36 


15 


17 


13 


6 


1 




29 


37 


4 


55 


54 


31 


22 


26 


17 


2 




16 


6 


6 


16 


16 


7 


10 


7 


2 


1 




552 


353 


145 


899 


793 


265 


180 


162 


94 


2 




98 


60 


15 


186 


122 


47 


40 


23 


14 


1 




133 


74 


33 


219 


164 


66 


36 


47 


24 


2 




25 


15 


1 


54 


40 


13 


14 


6 


4 


1 




228 


153 


60 


356 


343 


166 


95 


80 


41 


2 




75 


45 


12 


125 


116 


34 


25 


20 


12 


1 




41 


25 


13 


64 


56 


22 


15 


14 


11 


2 




17 


6 


1 


19 


18 


6 


3 


3 


2 



zero constraint on this parameter, has an adequate chi-squared goodness of fit of 8.63 on 
6 degrees of freedom. 

The fohowing example concerns a larger contingency table including two ordinal vari- 
ables with three levels. In the analysis these variables are treated as nominal variables 
using the baseline contrasts ([2]). Although the nature of the variables cou l d be handled 
by using other more appropriate contrasts, as explained in Bartolucci et al. (j2007l ). the fit 



of the marginal independence model is nevertheless invariant. 
Example 10. Table [6] summarizes observations for 13067 individuals on 6 variables ob- 



taine d from as many questions taken from the U.S. General Social Survey (iDavis et al 



20071 ) during the years 1972-2006. The variables are reported below with the original name 
in the GSS Codebook: 

C CAPPUN: do you favor or oppose death penalty for persons convicted of murder? 
(l=favor, 2=oppose) 

F CONFINAN: confidence in banks and financial institutions (1= a great deal, 2= 

only some, 3= hardly any) 
G GUNLAW: would you favor or oppose a law which would require a person to obtain 

a police permit before he or she could buy a gun? (l=favor, 2=oppose) 
J SATJOB: how satisfied are you with the work you do? (1 = very satisfied, 2= 

moderately satisfied, 3 = a little dissatisfied, 4= very dissatisfied). Categories 3 

and 4 of satjob were merged together. 
S SEX: Gender (f,m) 

A abrape: do you think it should be possible for a pregnant woman to obtain legal 
abortion if she became pregnant as a result of rape? (1= yes, 2 = no) 
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In data sets of this kind there are a large number of missing values and the table used in 
this example collects only individuals with complete observations. Therefore, the following 
exploratory analysis is intended to be only an illustration with a realistic example. From 
a first analysis of the data, the following marginal independencies are not rejected by the 
chi-squared goodness of fit test statistic 

FMCA GMJA JMGS AMFG 
Xl = 6.7 xi = 3.3 xi = 8.1 xi = 2.1 

and thus they suggest the independence model represented by the bi-directed graph in 
Figure [3]^ a). Fitting this model, under the multinomial sampling assumption, we obtain 
an adequate fit with a deviance of 17.29 on 17 degrees of freedom. The Aitchison and 
Silvey's algorithm converges after 13 iterations. The encoded independencies cannot be 
represented by a directed acyclic graph model with the same observed variables, because 
the graph contains at least one subgraph which is a chordless 4-chain. The disconnected 
set parameterization defined by the ordered decomposable sequence 

Mg = {CF, FA, GJ, GA, JS, CFA, FGA, GJS, GJA, CFG J S A} 

is variation independent. Instead, by searching in the c lass of graphica l log-linear models 



with the backward stepwise selection procedure of MIM (jEdwardsl . l200d ) we found a model 
with a deviance of 103.16 over 110 degrees of freedom. The model graph is shown in 
Figure [3l[|b). Other selection procedures show however that there are several equally well 
fitting models. The chosen undirected graph is slightly simpler (2 edge less) than the 
bi-directed graph. As anticipated, the number of constraints on parameters is however 
much higher. From the inspection of the studentized multivariate logistic estimates, we 
noticed that the higher order log-linear parameters are almost all not significant and thus 
we fitted a reduced model, by further restricting to zero all the log-linear parameters of 
order higher than two, obtaining a deviance of 108.34 on 118 degrees of freedom. The 
estimates of the remaining nonzero two- factor log- linear parameters are shown in Table [71 
These are estimated local log odds-ratios in the selected two-way marginal tables and they 
have the expected signs. By comparison, the fitted non-graphical log-linear model with 
the graph of Figure [3^b), with additional zero constraints on the log-linear parameters of 
order higher than two, leads to a chi-squared goodness of fit of 118.49 on 119 degrees of 
freedom. Both models thus appear adequate. 




Figure 3. Data from the U.S. General Social Survey 1972-2006. (a) A 
bi-directed graph model (xfj = 17.29J. (b) A graphical log-linear model 
rxfio = 103.16;. 
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Table 7. Estimates of two-factor log-linear parameters for the bi-directed 
graph model of Figure\3(a) with additional zero restrictions on higher order 
terms. The asterisks indicate the parameters for which the Wald statistic 
is significant. 





P ?) r ?i TTi p f p r 


l^cf ii-p of p 


S.G. 




1\/Tj^ ro'm 




T^jStiiTia tp 


S.6. 




CG 




—0.38 


0.048 


* 


FG 




—0.01 


0.047 




CJ 


(I) 


0.10 


0.043 


* 




(2) 


0.16 


0.058 






(2) 


0.14 


0.058 


* 


FJ 


(1) 


0.29 


0.044 




CS 


(1) 


0.46 


0.040 


* 




(2) 


0.05 


0.065 




CA 


(1) 


0.56 


0.049 


* 




(3) 


0.04 


0.056 




GS 


(1) 


-0.77 


0.042 


* 




(4) 


0.36 


0.072 




JA 


(1) 


-0.21 


0.051 


* 


FS 


(1) 


-0.004 


0.040 






(2) 


-0.03 


0.075 






(2) 


-0.35 


0.051 


* 


SA 


(1) 


0.18 


0.047 


* 













The last example shows that sometimes the best fitted marginal independence model 
may be simpler than the best fitted directed acyclic model. 

Example 11. Th e set of d a ta in Table [8] is taken from the General Social Survey in 



Germany in 1998 (jALLBUSl . Il998l ). In a selected population aged between 18 and 65, 
the answers of 1228 respondents are collected about the following 5 binary variables U , 
unconcerned about environment (yes, no); P, no own political impact expected (yes, no), 
E\ parents education, both at lower level (at most 10 years) (yes, no); A, age under 40 
years(yes, no ); S, gend e r (fein ale, male). A possible ordering of the variables has been 



suggested by IWermuthI (j2003l ) , who analyzed a superset of this data set and discussed 
a directed acyclic graph model. Using a similar ordering, limited to the variables here 
studied, we consider the variables {A, S} as purely explanatory, E and P as intermediate 
and U as final response. Our final well fitting directed acyclic graph model, shown in Fig- 
ure IH a), has a deviance 3.70 over 3 degrees of freedom. The subgraph for all the variables 
except gender S is complete. Specifically, the graph has an edge E ^ U, indicating a di- 
rect effect of education on the final response. The model without the arrow E ^ U has a 
worse goodness of fit Xi5 = 36.0 and further it can be verified that the two-factor log-linear 
parameters EP and EA are large and significant. Model selection in the class of the graph- 
ical log-linear models does not lead to any sensible reduction whilst search in the class of 
bi-directed graph models shows that a special structure of marginal independencies holds. 

Table 8. Data from the German General Social Survey in 1998. 
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yes 








no 












S 


f 




m 




f 




m 




A 


E 


P 


yes 


no 


yes 


no 


yes 


no 


yes 


no 


no 


yes 




6 


8 


7 


27 


66 


186 


24 


230 




no 




4 





1 


9 


8 


64 


4 


60 


yes 


yes 




2 


2 


11 


6 


28 


159 


16 


130 




no 







1 





2 


4 


75 


8 


80 
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(a) (b) 

Figure 4. Two graphical models fitted to data from the General Social 
Survey in Germany, 1998. (a) A directed acyclic graph model: xi = 3.70. 
(b) A bi-directed graph model: xi = 5.91. 



The final selected bi-directed graph, represented in Figure Hl^b) , represents the marginal 
independencies SAL A, E and EA. S, U. The bi-directed graph contains the chordless 4- 
chain EAUS and thus it is not Markov equivalent to any directed acyclic graph in the five 
variables. This suggests that the directed acyclic graph model conceals some distortions 
due to the presence of latent variables. Also in this case, the disconnected set parame- 
terization defined by the sequence Mg = {GE, GF, AE, GEE, GEA, ABEFG) leads to a 
variation independent parameterization because it can be verified that the sequence Mg 
is order decomposable. 

7. Discussion 



The discrete models based on marginal log-linear models by iBergsma &: RudasI (l2002l ) 
form a large class that includes several discrete graphical models. The undirected graph 
models and the chain graph models under the classical (Lauritzen, Wermuth, Frydenberg) 
interpretation can b e parameterized as marginal log-linear models. For an introduction see 



Rudas et al. 



(120061 ). This paper shows that the discrete bi-directed graph models under 
the global Markov property are included in the same class by specifying the constraints 
appropriately. In general, three main criteria were considered in choosing a marginal 
log-linear parameterization. 

(a) Upward compatibility: if the parameters have a meaning that is invariant across 
different marginal distributions, then the interpretations remain the same when a 
sub-model is chosen. We saw that the multivariate logistic parameterization has 
this property. 

(b) Modelling considerations: the parameterization should contain all the parameters 
that are of interest for the problem at hand. For example, in a regression context 
where some variables are prior to others, effect parameters conditional on the 
explanatory variables are most meaningful. In the seemingly unrelated regression 
problem of Example [TJ the chosen parameters have the interpretation of logistic 
regression coefficients. 

(c) Variation independence: if the parameter space is the whole Euclidean space, this 
has certain advantages. First, the interpretations are simpler, because in a certain 
sense different parameters measure different things. Second, in a Bayesian context, 
prior specification is easier. Finally, the problem of out-of-bound estimates when 
transforming the parameters to probabilities is avoided. In the examples, we always 
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found a variation independent parameterization, but a characterization of the class 
of bi-directed graphs admitting a variation independent complete and hierarchical 
marginal log-linear parameterization is an open problem. 

The three criteria are in some cases conflicting: typically variation independence is ob- 
tained at the expense of upward compatibility. 

The multivariate logistic parameteriz ation has a purpose s i milar to that of the Mobius 
parameterization recently proposed by iDrton &: RichardsonI (j2007l ) for binary marginal 
independence models, which is based on a minimal set of marginal probabilities identify- 
ing the joint distribution. These authors discuss the type of constraints on the Mobius 
parameters needed to specify a marginal independence, showing that they take a simple 
multiplicative form. The same constraints are defined by zero restrictions on marginal 
log-linear parameters in our approach. Even if the parametric space can be awkward, 
this problem is handled by a fitting algorithm that operates in the space of the expected 
frequencies, while the parameters are used only to define the independence constraints. 
Moreover, the definition of the models through the complete specification of the marginal 
log-linear parameters gives some advantage when there is a mixture of nominal and ordinal 
variables because it allows to define appropriate paramet ers for both type s of v ariables 
using the theory of generalized marginal interactions by (iBartolucci et al\ . 120071 ). This 
opens the way to defining subclasses of discrete graphical models specifying equality and 
inequality constraints. 

The proposed algorithm for maximum likelihood fitting of the bi-directed graph model 
is a very general algori thm of constrained o ptimi zation based on Lagran ge multip l iers. I t 
is essentially based on lAitchison k. Silveyl (119581 ) as late r developed bv iBergsma 
Similar algorithms have bee n prop o sed, f or instance, bv iMolenberghs &: Lesaffre 



Glonek McCullaghl (|l995l ) , iLana (jl996l ) and further generalized by 



1997 ). 



1994) 



Colombi &: Forcina 



(120011 ). Its main advantage is its generality (it can be applied to all models defined by 
constraints on the marginal log- linear parameters). As previously stated, the algorithm 
does not require further iterative procedures for computing, at each step, the inverse 
transformation from the marginal log-linear parameters to the cell probabilities. Thus, 
the risk of not compatible estimates that could arise for the lack of variation independence 
is avoided. The disadvantage is that, as for many gradient-based algorithms of this type, 
convergence is not guaranteed and that it requires the computation of a large expected 
information matrix. However, empirically, convergence is achieved in a relative few number 
of iterations by including a step adjustment. An alternative algorit hm with convergence 



guarantees is the Iterated Conditional Fitting algorithm, proposed by lDrton &: Richardson 
(|2007l ) for binary bi-directed graph models in the Mobius parameterization. A comparison 
between the two algorithms in terms of performance, speed and memory requirements 
needs further investigation. 
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